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Several applications of latent trait models in 
reading research are discussed, their nature reviewed, and two 
experiments presented as heuristic applications of the Hasch Model. 
Two studies cf children's comprehension of selected anaphoric 
structure? in prose were perforited using both conventional and Sasch 
Model analyses. Twenty-six children (grades 2, and 6) in th^B first 
studj and 91 children (grades 2, and 6) in the second study were 
asked to read 16 passages containing the pronoun "it." After reading 
each passage, the subjects responded t? a question requiring 
identification of the pronoun's referent, and the responses were 
scored right cr wrong. Both conventional ar.d Pasch Model scoring 
procedures revealed that: {^) variances for the three grades were 
equal in the first study: and (2) no significant sex effect was 
identified in the second study. Both studies demonstrated th;it 
benefits can fce derived by utilizing Pasch Model measurement 
procedures ir reading research- Prc^jross in reading research depends 
npon replication of findings across various studies, and tne Pasch 
acdel faicilitates this process because item (or person) calibrations 
are saitple (cr item) free- (PL) 
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Abstr 




Recent literature on reading processes reflects improved use of research 
methodologies. Nevertheless, several improvements can still be achieved. For 
example, progress in realizing better understand i^.^. of psycholinguistic 
processes might be facilitated by the use of latent trait measurement models. 
The features of these models are discussed intuitively. Two heuristic 
applications of a latent trait model are presented. Subjects in both these 
experiments were elementary school children. Both studies examined the impacts 
of selected variations in the anaphoric structures of prose. Finally, some 
potential advantages of the latent trait measurement models are -discussed . 
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Tne last decade has witnessed a dramatic growth in understanding of reading 
processes. Today, there is less of an inverse relationship "between the size 
and complexity of the lingiuistic unit being studied and the a^iourit of research 
devoted to that unit" (Tnorndy-ke, 1975, p. 1). An important benchmark of this' 
progress is the theory advanced by researchers such as Anderson and Bower 
(1973), van Dijk (1973), Kintsch (197^), Frederiksen (1975), Rumelhart (1975), 
and Anderson, Reynolds, ochallert, and Goetz (1975). 

Several factors have facilitated this progress. But these gains have been 
realized at least partly because reading researchers have become more 
sophisticated ^n the methodologies which they employ. In particular, 
improvements have occur>*ed in the application of analysis of va^-iance (ANOVA) 
procedures, procedures which are frequently applied in zhis area of inquiry. 
Foi example, Clark ( 1973) cirid later Coleman and Morris ( 1978) have influenced 
the selection of the error terms which are used to evaluate treacment effects. 
Marascuilo and Levin (1970, 1976) have pointc.: out the importance of avoiding 
Type IV errorz when conducting tests of certain hypotheses. 

Notwithstanding past improvements in methodological practice, however, some 
additional improvements remain desirable. Morrow ^ind Frankiewicz ( 1979) have 
identified certain errors which continue to be made in some applications of 
ANOVA and ANOVA analogues. Also, the myth that analysis of covariance can 
always magically equalize non^-equivalent coni^rol groups has not yet fully been 
dispelled (Campbell ?c Erlebacher, 1975). Finally, researchers have yet to give 
adequate attention to "power" considerations when reporting their work (Cohen & 
Hyman, 1979). A.^ even more fundamental error, however, is manifest in the 
mea-'jrement which is performed in some reading research. Unfortunately, even 
the iHOSt elaborate test statistics can not rescue a study from the pitfalls of 
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inappropriate measurement. 

These limitations might be avoided if researchers made more use of "latent 
trait" measurement mcdrts. These mocels have been usefully applied in myriad 
content areas, including intelligence testing (Anderson, Kearney, ^ Everett, 
1968), the preparation of Civil Service examinations (Durovic, 1970), and 
mathematics testing (Connolly, Nachtman, & Pritchett, 197^)- Other example 
applications of latent trail models have been listed oy Rentz and Rentz (1978). 
However, applications of latent tr-'t ncdels in reading research remain ^are 
indeed. The few exceptions r,hi^ -ule include stud*^s by Rentz and Bashaw 
( 1977) and by Andrich and Godfrey , ^"ri. 

lihe purpose of this paper is to discuss several applications of latent 
trait models in reading research. Specifically, the paper^ reviews on an 
intuitive level the nature of latent trait models, and presents two experiments 
as heuristic application^; of one latent trait method, the Rasch model. 

Over-vj^ - :f latent trait models 

Latent trait theory proposes that the abilities of tested subjects ore 
latent in their test item responses, but can be estimated by specifying the 
nature of the relationship between observed performance and tne unobserved 
traits which are presumed to underlie performance. Several latent trait models 
have been delineated, including models proposed by Lord (1952) and by Birnbaum 
(1968)* However, probably the most '^dely Iciown and most frequently applied 
latent trait model is the model proposed by Rasch (1960), and it is the Rasch 
model uhich is discussed and applied in this paper, A more complete discussion 
of latent trait theory and other latent trait models is available elsewtiere (cf. 
Hambleton, Swamiwathan , Cook, Eignor, & Gif ford , 1978). 
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The logic of the Rasch ariodel is quite straightforward. As WrighL and Stone 
^-(1979, P* xiii) explain, the model assumes that success on any test item is 
"entirely governed cy the difference betweeen the ability of the person and the 
difficulty of the item. Nothing more. The more able the person, the better 
their [src] chances for success with any item. The easier the item, the more 
likely any person is to solve it. It is as simple as that." The mathematics 
necessary for estimating the latent difficulty of each test item and the latent 
ability of each tested subject are not quite so simple, although reasonable 
approximations of estimates can be calculated by hand if the researcher does not 
have access to an appropri.^.e computer program (cf. Wright 4 Stone, 1979, 
chapter 2). However, three aspects of the model are noteworthy. 

The model is orderly . Other measurement approaches can posit that the more 
able of ^wo persons is always more likely to succeed on any given item', or that 
any given person *is always more likely to succeed on the easier of any two 
. items. However, the Rasch model requires that these assumptions be made, and 
more importantly provides test statistics which can be emploi^ed to evaluate 
deviations from expected performance by either persons or. items. 

The model is also objective . When conventional measurement procedures are 
used, i^.em difficulty estimates are not invariant across different samples of 
subjects, and the ability scores assigned to subjects are not invariant across 
different tests. However, the Rasch model does generate both ^^cun^^lc- f^^ree He^n 
calibration- and tdSt-ircc person ability scores. The importance of these kinds 
of estimates at first may be difficult to comprehend, but the magnitude of this 
^ contribution has been recognized by researchers such as Loevinger (1955," 

P- 151), who noted that "R;.sch must be credited with an outstanding contribution 
to one of the owo central psychometric problem^, the achievement of nonarbitrary 
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measures." 

Finally, the calibrations generated by the model are truly interval , 
Reading researchers frequently require subjects to read prose or to memorize 
words or symbols. Each subsequent task, e.g. — recall, closure, etc., is 
typically then scored "1" for a success or "0" for a failure. Next, scores are 
summed across items for each subject in order to arrive at an aggregate unit of 
'analysis. 

One problem with this process is that the difficulties of the items are 
presumed to be at least approximately zne same. This assurrption means that the 
item scores can legitimately be sumrred to provide a total test score. 
Unfortuantely, most researchers rarely test how well this assumption applies for 
a given data set. The importance of the violation of- this additivity assumption 
will be demonstrate 1 in at least one of the two experiments reported heref 

Heuristic applications of the model 

Two studies of children's comprehension of selected anaphoric structures in 
prose were performed to demonstrate some applications of the Rasch model . in 
reading research. Both conventional and Rasch model analyses were performed in 
both studies so that a comparison of methods yould be facilitated. Different 
children served as subjects in the two studies. However, the subjects in both 
studies were native English speaking working class children in grades two, four, 
and six. Subjects were excluded from the study if their standardized r adl 
test scores were substantially below average. 

Both studies investigated the comprehension of . pronouns embedded in 
passages with different structures. Thi:> area of investigation is currently 
receiving considerable attention (cf. Richek, 1977). ^'For examjile, Bormuth, et 



Pa^e 6 

al . (1970) presented fourth grade students with short passages containing 
pronouns embedaed in different structures, and then identified a hierarchy of 
difficulty for the various st'^uctures. However, Lesgold (1974) challenged this 
heirarchy in a study which produced somewhat different results. Of course, some 
variation in findings should be expected, sinc-^ the background knowledge of 
subjects and the semantic content of passages can interact and override the 
influence of syntactic pgssage features (cf. Rumelhart, 1977)* 

At least three variations in the presentation of a pronoun in a passage can 
be identified. First, .a pronoun's referent can either precede or follow the 
pronoun. Chomsky's (1969) research suggests that forward structures are easier 
for young children, to comprehead orally. Second., a pronoun's referent may 
either be within the same sentence or be within another sentence. Third, a< 
pronoun's referent may either be a noun phrase or a longer clause or sentence. 

Although the wording and content of , the passages used in the studies 
varied, in both studies the subjects were asked to read 16 passages containing 
the pronoun "it." After reading each passage, the subjects re::porided to a 
question requiring the identification of the pronoun's referent, and the 
responses were scored right-w^cng according to whether the correc- referent was 
identified or a distractor item was choosen. Both studies utilized two passages 
representing each of the struct jre combinations presented in Table 1. 

Insert Table 1 about here .y , ^ 
Experiment I 

The subjects in the first experiment were 26 children from each of the 
grades, grades two, four, and six. The global null hypothec the study was 

thac there would be no statistically significant difference:: :g the three 
^ mean test scores of the children in the three different grade levels. After the 
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data were collected, the data were analyzed to determine if any items or any 
subjects behaved in a manner .which' deviated subscantiaily from Rasch model 
expectations. No subjects and no items were identified as model 

"misfits," i.e. — de\*iated substantially from exiDected behavior. Consequently, 
sample- fy-ee item difficulties and test-free person ability estimates could be 
and were derived using all 16 test items and all 78 subjects. 

In order to provide a direct comparison between conventional and Rasoh 
model scoring procedures, the tests were scored in two. ways. The tests were 
SCO 'ed by counting the nunber of right answers each person selected. The tests 
were also scored by cumulating the ^-ample-free item difficulty estimates for 
each item vJiic-h er^ch person correctly answered, after a cons>;9nt was added to 
the difficulty estimates so that none were negative. 

For both scoring procedures, a prelimanary null hypothesis that the 
variancer. for the three grades were equal 'was tested. For the conventional 
scoring procedure the oreliminary null hypothesis was not rejected (Bartlett's 
F=.3» p>.05). For the Rascl: model scoring procedure the preliminary null 
hypothesis was not rejected (F=.8, p>.05). These results suggested that ANOVA's 
could be conducted without violating the homogeneity of variance assumption. 

Since the grade-way was quantitative and the levels 'within the way were 
equally spaced, a priori polynomial contrasts were c'pplied to identify whether 
or not any observed differences am^ng the three means reflected either a linear 
or a non-linear trend. ANOVA keyouts frcn both analyses are presented in Table 
2. The Table 2 keyout illustrates that the results of the two procedures can 
lead to different conclusions. 

Insert Table 2 about here- 



On 3 substantive level, the sample-fVee item difficulty scoring procedure 
suggests that between grades two and four children improve in their ability to 
interpret the pronoun "it." This finding is consistent with past research. 
However, after the fourth grade there is apparently less motivation for children 
to focus ...n hij-hly specific syntactic features of the" prose which they read. 
This finding is consistent with a belief that as children become more proficient 
at using syntactic rules, they focos mor^ on an '.nteractive combination of the 
syntactic and the content features of prose (Pea:*son & Kamil , 1978). Of course, 
this result may be a sampling artifact which would not be replicated in 3 
longitudinal study. Tne external validity of this result remains to be explored 
in future research. 

Experiment II 

The subjects in the second experiment were 91 second, fourth, and sixth 
graders. Of the 91 subjects, ^4 were boys and ^7 were girls. The null 
hypothesis of the study was that the mean ability score of the boys would not ba 
significantly different from the mean score of the girls. This hypothesis was 
of limited substantive interest, but will facilitate discussion of some 
additional features of latent trait methods. After the data were collected, the 
data were analyzed to determine if any items or subjects behaved in a manner 
^which deviated substantially ^=.05) from Rasch model expectations. No items 
were identified as "misfits," but one subject did deviate substantially from 
expected performance (t=2.1, p<.05). 

Table 3 presents the expected and the actual performance of the 
"misfitting" subject on the 16 test items. The items are listed in order of 
their sample-fVee difficulty estimates- Since the subject made seven correct 
responses, it should be expecte^ that the seven easiest items would have been 
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correctly answered while the remaining nine items would have been missed. 
Instead, this individual missed two of the three easiest items and correctly 
answered the two most difficult items. Wright and Stone (1979) might call th:- 
a combined "sleeping," i,e.— warm-up, and "guessing" pattern. Because ^he 
subject deviated substantially from expected performance, the subject 
excluded from further analysis. * ^ 

. Insert Table 3 about here. 

Table 3 also illustrates that . - -free item difficul'y estimates can 
differ from the sample-bound item difficulty estimates. Some items with 
identical sample-bound difficulty estimates have different sample-free values, 
and vice versa. Of course, the magnitude of these differences will vary from 
study to study, but clearly the different estimates will not necessarily be 
similar. 

In order to test the r.ull hypothesis of the experiment, person ability 
scores were first estimated in the conventional manner, i.-^^. — by counting each 
person's nunber of correct response.-. Tne Pasch "--^ t-free person ability scores 
were used for the alternative scoring p'^ocedure. Ir. l.nz study, the homogeniety 
of variance assumption was not violate: u-hen eithe-^ of the two scoring 
procedures was used, and so one-way ANOVA's wer-e per forme::- No significant sex 
effect was identified using the conventional scorine: procedure (£=.2, p>.C5), 
nor was a significant sex effect identified using the Rasch test-fy^ee person 
ability scores , p>.C)5). 

Discussion 

The two experiments demonstrate s>everal important benefits which can be 
derived by utilizing Rasch model measurement procedures in reading research. At 
O least two of these benefits merit particular emphasis. Progress in reading 
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research depends .n the final analysis upon replication of findings across 
various studies. The growinR emphasis on replication has been reflected in some 
recent essays (cf. Carver, 1973), and developing methodologies for empirically 
integrating research studies (cf. Iverson i Walberg, 1979'. The use of the 
Rasch model facilitates this process, because item (or person) calibrations are 
sample (or item) free, and consequently can be more sensibly combined across 
studies. 

Figure 1 provides a huerisuic demonstration of such an integration. Th^ 
figure integrates the difficulty estimates for the different passjges acros^-<he 
two different samples. Of course, the person-tVee difficulty estimates may 
themselves have important implications for psycholinguistic theory. For 
example, Figure 1 suggests that forward referent order and noun-heferent 
structures are easier for children to interpret than backward referent order and 
sentence-referent structure, respectively. 

Inser: Figure J about here. 

The importance of the misfit statistics of the Rasch model is also 
noteworthy. Reading researchers frequently eliminate subjects who do not meet 
minimal ability criteria. Even 'when this is done, some subjects whose test 
performance reflects either "sleeping" or "guessing" or both will unfortunately 
be included in conventional analyses. Similarly, conventional analyses will 'not 
id^tify^ "misbehaving" test items unless an items's behavior is genuinely 
bizarre, e.g. — everybody misses the item. However, the Rasch model integrates 
expectations about item and person behavior, and provides test statistics for 
evaluating deviations from expectations. 
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To date, "the major factors that have hindered widespread use of these 
methods are the lack of familiarity on the part of prac::itioners and the lack of 
user oriented computer programs" (Hambleton, et al., 19"'8, p. 503). However, 
these difficulties can now be at least partially overvome by consulting one of 
the recently publi/srfed texts on latent trait measurement (cf. Wright & Stone, 
1979), and by acquiring one of the recently developed computer programs which 
implement these models* In summary, latent trait measurement models appear to 
have some potentially helpful applications in psycholinguistic inquiry; these 
potentials have not yet been fully realized. 
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Table 1 
Structure Combinations 



Order 


Distance 


Referent Type 


Acronym 


Forward 


intrA 


Noun phrase 


FAN 


Forw?ird 


intr A 


Sentence/ clsuse 


FAS 


Forward 


inteR 


Noun phrase 


FRN 


Forward 


inteR 


Sentence/Clause 


FRS 


Backward 


intrA 


Noun phrase 


BAN 


Backward 


intrA 


Sentence/clause 


BAS 


Backward 


inteR 


Noun phrase 


BRN 


Backward 


in'ccR 


Sentence/ clause 


BRS 



Hereafter the pe^ssages are arbitrarily each nunbered "1" or "2." lius, 
"FAN1" refers to nunber one of two passages with a Noun phrase referent 
presented in an intr A-sentence Forward referent-order (FAN) structure. 



Table 2 

ktiOVk Keyouts for Oade-level Hypothesis 
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Sun of 




Mean 




Method 


Source 


Squares 


df 


Squares 


F 


Conventional 


Linear 


111.1 


1 


111.1 


15.6*** 




Non-linear 


13-6 


1 


13.6 


1.9 




Within 




75 


7-1 




Rasch 


Linear 


-5 


1 


.5 


.3 




Non-1 inear 


8.1 


1 


8.1 


5.1* 




Within 


118.2 


75 


1-6 





*p<.05 
**p<.C1 
***p<.001 
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Table 3 

"Misfitting" Person's Performance 



It era 


Actual 


Expected 


Item 


Difficulties 


Acronym^ 


Performance^ 


Performance 


Rasch 


.Conventional 


BAN1 


0 




-P.ii3 


• 93 


FRN2 






-1.79 


.91 


FRS2 


0 




-1.3d 


.87 


FRN1 


1 




-1.24 


.87 


FAN1 


1 




-.74 


.81 


FAS1 . 


0 




-.27 


.74 


FAS2 


1 


^ 


-. 14 


.64 


BRN1 


0 


0 


-. 14 


.71 


FAN2 


0 


0 


.30 


..64 


BAN2 


0 


0 


.41 


.62 


BAS2 


1 


0 


.47 


.62 


BRS1 ■ 


0 


0 


.58 


.58 


BRN2 


0 


0 


.96 


.50 


BRS2 


0 


0 


1.50 


.40 


FRS1 


1 


0 


1.73 


.36 


BAS2 


1 


0 


2.15 


.2a 


1 for acronym derivatives. 



Scored "1"=right ;. "O"=wrong. 

'n of subjects correctly answering item divided by n of sub^ec'ts. 
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Figure Caption and Hote 
Figure 1 

Structures Arrayed Along Sample-free Difficulty Continuum 



Note , See Table 1 for acronym derivatives. The passages presented in 
Experiment I are identified by asterisks. 
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-2.5 



-h BANl 



-2.0 -- 



-1.5 



-1.0 -- 



-.5 



,0 



-- FANl. BAN2* 

FAS2*, BRSl* 
FRSl* 

FASl. FRN2*, BRN2^ 
FRNl* 

FAS2. BRNl 



2.0- 



2.5 i. 



BRNl* 
FRN2 



FRS2. FAN2^ 
FRNl 



+ FAN2 

BAN2. BANl* 
,5 f BASl. FRS2* 
BRSl 
-h BASl*. FASl* 



l.Ot BRN2 
BRS2* 



-- FANl* BAS2* 
1.5+ BRS2 



FRSl 



- BAS2 



